Statistical Analysis of Semi-Supervised Regression

نویسندگان

  • John D. Lafferty
  • Larry A. Wasserman
چکیده

Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Covariance Operator Based Dimensionality Reduction with Extension to Semi-Supervised Settings

We consider the task of dimensionality reduction for regression (DRR) informed by realvalued multivariate labels. The problem is often treated as a regression task where the goal is to find a low dimensional representation of the input data that preserves the statistical correlation with the targets. Recently, Covariance Operator Inverse Regression (COIR) was proposed as an effective solution t...

متن کامل

Graph-based Semi-Supervised Regression and Its Extensions

In this paper we present a graph-based semisupervised method for solving regression problem. In our method, we first build an adjacent graph on all labeled and unlabeled data, and then incorporate the graph prior with the standard Gaussian process prior to infer the training model and prediction distribution for semi-supervised Gaussian process regression. Additionally, to further boost the lea...

متن کامل

Title of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing

In this thesis, I present three supervised and one semi-supervised machine learning approach for improving statistical natural language dependency parsing. I first introduce a generative approach that uses a strictly lexicalised parsing model where all the parameters are based on words, without using any part-of-speech (POS) tags or grammatical categories. Then I present an improved large margi...

متن کامل

Semi Supervised Logistic Regression

Semi-supervised learning has recently emerged as a new paradigm in the machine learning community. It aims at exploiting simultaneously labeled and unlabeled data for classification. We introduce here a new semi-supervised algorithm. Its originality is that it relies on a discriminative approach to semisupervised learning rather than a generative approach, as it is usually the case. We present ...

متن کامل

Semi-supervised Penalized Output Kernel Regression for Link Prediction

Link prediction is addressed as an output kernel learning task through semi-supervised Output Kernel Regression. Working in the framework of RKHS theory with vectorvalued functions, we establish a new representer theorem devoted to semi-supervised least square regression. We then apply it to get a new model (POKR: Penalized Output Kernel Regression) and show its relevance using numerical experi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007